Random Forests


In [ ]:
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline
import pandas as pd
from sklearn.model_selection import train_test_split

In [ ]:
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

In [ ]:
from sklearn.ensemble import RandomForestClassifier
X_train, X_test, y_train, y_test = train_test_split(
    cancer.data, cancer.target, stratify=cancer.target, random_state=1)
rf = RandomForestClassifier(n_estimators=100).fit(X_train, y_train)

In [ ]:
rf.feature_importances_

In [ ]:
pd.Series(rf.feature_importances_,
          index=cancer.feature_names).plot(kind="barh")

Exercise

Use a random forest classifier or random forest regressor on a dataset of your choice. Try different values of n_estimators and max_depth and see how they impact performance and runtime. Tune max_features with GridSearchCV.